Question 1

In the Wallsend electorate at the 2003 NSW state election, the distribution of votes
(when 20% of the vote was counted) were as follows:

John Mills (Labor) Bob Georghen (Liberal) John Tongle (Shooters) Other
3024 2021 316 435
  1. Find a 95% interval estimate for the true proportion of votes in the population of Wallsend, for each candidate above , using an MCMC algorithm.

From the lecture notes, with \(n = 5796\): \[ p(\theta \mid y) \propto \theta_1^{^{3024}}\,\theta_2^{^{2021}}\,\theta_3^{^{316}}\,(1-\theta_1-\theta_2-\theta_3)^{^{435}} \]

We propose a \(beta(1512,1386)\) where \(1512 = 3024/2\) and \(1386 = 5796/2 - 1512\).

In R:

r1 <- 0.5; r2 <- 0.35; r3 <- 0.05
alpha1 <- 1512; beta1 <- 1386
alpha2 <- 1010.5; beta2 <- 1887.5
alpha3 <- 158; beta3 <- 2740

for (k in 2:10000) {
  
  rn = rbeta(1,alpha1,beta1)
  
  alph = 3024 * log(rn) + 
         435 * log(1 - rn - r2[k - 1]) - 
         3024 * log(r1[k - 1]) -
         435 * log(1 - r1[k - 1] - r2[k - 1] - r3[k - 1])
  alph = alph + 
         log(dbeta(r1[k - 1], alpha1, beta1)) - 
         log(dbeta(rn, alpha1, beta1))
  
  if (runif(1) < exp(alph)) {
    r1[k] = rn
    } else {
      r1[k] = r1[k-1]
    }
  
  rn = rbeta(1,alpha2,beta2)
  
  alph = 2021 * log(rn) + 
         435 * log(1 - rn - r1[k] - r3[k - 1]) - 
         2021 * log(r2[k - 1]) -
         435 * log(1 - r1[k - 1] - r2[k - 1] - r3[k - 1])
  alph = alph + 
         log(dbeta(r2[k - 1], alpha2, beta2)) - 
         log(dbeta(rn, alpha2, beta2))
  
  if (runif(1) < exp(alph)) {
    r2[k] = rn
    } else {
      r2[k] = r2[k-1]
    }
    
  rn = rbeta(1,alpha3,beta3)
  
  alph = 316 * log(rn) + 
         435 * log(1 - rn - r1[k] - r2[k]) - 
         316 * log(r3[k - 1]) -
         435 * log(1 - r1[k] - r2[k] - r3[k - 1])
  alph = alph + 
         log(dbeta(r3[k - 1], alpha3, beta3)) - 
         log(dbeta(rn, alpha3, beta3))
  
  if (runif(1) < exp(alph)) {
    r3[k] = rn
    } else {
      r3[k] = r3[k-1]
    }
}

hist(r1)

quantile(r1,c(0.025,0.975))
##      2.5%     97.5% 
## 0.5035750 0.5401427
plot.ts(r1)

hist(r2)

quantile(r2,c(0.025,0.975))
##      2.5%     97.5% 
## 0.3327194 0.3645715
plot.ts(r1)

hist(r2)

quantile(r3,c(0.025,0.975))
##       2.5%      97.5% 
## 0.04778726 0.06155338
plot.ts(r3)

y = c(3024,2021,316,435)
r1 = 0.5; r2 = 0.35; r3 = 0.5

for (k in 2:10000) {
  r1[k] = (1 - r2[k - 1] - r3[k - 1]) * rbeta(1, y[1], y[4])
  r2[k] = (1 - r1[k] - r3[k - 1]) * rbeta(1, y[2], y[4])
  r3[k] = (1 - r1[k] - r2[k]) * rbeta(1, y[3], y[4])
}
hist(r1)

quantile(r1,c(0.025,0.975))
##      2.5%     97.5% 
## 0.5092528 0.5351261
plot.ts(r1)


hist(r2)

quantile(r2,c(0.025,0.975))
##      2.5%     97.5% 
## 0.3360764 0.3609027
plot.ts(r2)


hist(r2)

quantile(r3,c(0.025,0.975))
##       2.5%      97.5% 
## 0.04883673 0.06055887
plot.ts(r3)


quantile(r1-r2,c(0.025,0.975))
##      2.5%     97.5% 
## 0.1499035 0.1975589
  1. Find a 95% interval estimate for the ‘Other’ candidates not mentioned above.
quantile(1-r1-r2-r3,c(0.025,0.975))
##      2.5%     97.5% 
## 0.0683673 0.0819158
  1. Estimate Pr (Labor > Liberal | y) in Wallsend.
mean(r1>r2)
## [1] 0.9999
  1. Find a 95% interval for \(\theta_{Labor} - \theta_{Liberal}\).
quantile(r1-r2, c(0.025,0.975))
##      2.5%     97.5% 
## 0.1499035 0.1975589
  1. Estimate Pr(Other > Shooters)
mean(1-r1-r2-r3>r3)
## [1] 0.9999

Question 2

Consider the data from the lecture:

George Bush Michael Dukakis Other
Votes 727 583 137
Parameter θ₁ θ₂ θ₃
pt. est. 0.502 0.403 0.095
y = c(727,583,137)
r1 = 0.5; r2 = 0.35

for (k in 2:10000) {
  r1[k] = (1 - r2[k - 1]) * rbeta(1, y[1], y[3])
  r2[k] = (1 - r1[k]) * rbeta(1, y[2], y[3])
}
hist(r1)

quantile(r1,c(0.025,0.975))
##      2.5%     97.5% 
## 0.4756807 0.5271495
plot.ts(r1)


hist(r2)

quantile(r2,c(0.025,0.975))
##      2.5%     97.5% 
## 0.3786701 0.4289472
plot.ts(r2)

  1. Derive the posterior density under a flat Dirichlet(1,1,1).

Check lecture notes: easy to derive using the process described there.

  1. Update the R code for MCMC in Q2(a) to find point and interval estimates for each
    parameter plus an interval estimate of \(\theta_1 - \theta_2\) and point estimate of Pr(\(\theta_1 > \theta_2 \mid y\))
  2. It is in fact possible to simulate from a Dirichlet distribution using regular MC methods, as explained in Appendix A of BDA. An alternative proposal follows below, from an
    earlier suggestion solution. Check that it is inadequate and explain why.